Entropy of Operator-valued Random Variables: A 
Variational Principle for Large Matrix Models. 



L. Akant, G. S. Krishnaswami and S. G. Rajeev* 
Department of Physics and Astronomy, 
University of Rochester, 
Rochester, New York 14627 

February 1, 2008 



Abstract 

We show that, in 't Hooft's large hmit, matrix models can be for- 
mulated as a classical theory whose equations of motion are the factor- 
ized Schwinger-Dyson equations. We discover an action principle for this 
classical theory. This action contains a universal term describing the en- 
tropy of the non-commutative probability distributions. We show that 
this entropy is a nontrivial 1-cocycle of the non-commutative analogue 
of the diffeomorphism group and derive an explicit formula for it. The 
action principle allows us to solve matrix models using novel variational 
approximation methods; in the simple cases where comparisons with other 
methods are possible, we get reasonable agreement. 
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1 Introduction 



There are many physical theories in which random variables which are operators 
-matrices of finite or infinite order- appear: for example, Yang-Mills theories, 
models for random surfaces and M-theory (an approach to a string theory of 
quantum gravity). In all these theories, the observables are functions of the 
matrices which are invariant under changes of basis; in many cases - as for 
Yang-Mills theories- the invariance group is quite large since it contains changes 
of basis that depend on position. We address the question of how to construct an 
effective action (probability distribution) for these gauge invariant observables 
induced by the original probability distribution of the matrices. 

Quantum Chromodynamics (QCD) is the matrix model of greatest physical 
interest. QCD is the widely accepted theory of strong interactions. It is a Yang- 
Mills theory with a non-abelian gauge group SU{N). Thus, the microscopic 
degrees of freedom include a set oi N x N hermitean matrices at each point of 
space-time: the components of the 1-form that represents the gauge field. In 
addition there are the quark fields that form an iV-component complex vector 
at each point of space-time. The number of 'colors', iV, is equal to 3 in nature. 
Nevertheless, will be useful to study the theory for an arbitrary value of N. 
Also, it will be convenient to regard U{N) rather than SU{N) as the gauge 
group. 

The microscopic degrees of freedom-quarks and giuons- do not describe the 
particles that we can directly observe|l|, ^, ^. Only certain bound states called 
hadrons-those that are invariant under the gauge group- are observable. This 
phenomenon-called confinement- is one of the deepest mysteries of theoretical 
physics. 

In earlier papers we have postulated that there is a self-contained theory 
of color invariant observables fully equivalent to QCD at all energies and all 
values of N. We have called this theory we seek 'Quantum HadronDynamics'Q 
and fully constructed it in the case of of two-dimensional space-time. Also we 
have shown that this theory is a good approximation to four dimensional QCD 
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applied to Deep Inelastic Scattering: it predicts with good accuracy the parton 
distrbutions observed in experiments 

Certain simplifications of the two-dimensional theory allowed us to eliminate 
all giuon ( matrix-valued ) degrees of freedom. This helped us to construct 
two dimensional Quantum Hadron Dynamics quite explicitly. To make further 
progress, it is necessary to understand theories in which the degrees of freedom 
are N x N matrices. Before studying a full-fledged matrix field theory we need 
to understand how to reformulate a theory of a finite number of matrices in 
terms of their invariants. |^ This is the problem we will solve in this paper. 
It is well-known that matrix models simplify enormously in the limit N oo 
1^. The quantum fiuctuations in the gauge invariant observables in gauge 
invariant states can be shown to be of order Thus, as long as we restrict to 
gauge invariant observables, in the limit N oo QCD must tend to some clas- 
sical theory. This classical theory cannot be Yang-Mills theory, however, since 
the fiuctuations in all states ( not just the gauge-invariant ones) would vanish in 
that limit. An important clue to discovering Quantum Hadron Dynamics would 
be to study its classical limit first. This is the strategy that worked in the case 
of two dimensions. 

The analogue of the field equations of this 'Classical Hadron Dynamics' has 
been known for a long time-they are the factorized Schwinger-Dyson equations 
. It is natural to ask if there is a variational principle from which this equation 
can be derived. Finding this action principle would be a major step forward in 
understanding hadronic physics: it would give a formulation of hadron dynam- 
ics in terms of hadronic variables, entirely independent of Yang-Mills theory. A 
quantization of the theory based on this action principle would recover the cor- 
rections of order j^. Moreover, we would be able to derive approximate solutions 
of the large- iV field equations by the variational method. 

Even after the simplifications of the large TV-limit, generic matrix models 

^ Since we understand by now how to deal with the quark degrees of freedom in terms 
of invariants, it is sufficient to consider toy models for pure gauge theory, without vectorial 
degrees of freedom. 
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have proved to be not exactly solvable: the factorized Schwinger-Dyson equa- 
tions have proved to be generally intractable. Diagrammatic methods have been 
pushed to their limit jl]. To make further progress, new approximation methods 
are needed- based on algebraic, geometric and probabilistic ideas. Moreover, 
the entire theory has to be reformulated in terms of manifestly [/(iV)-invariant 
observables. Thus, the basic symmery principle that determines the theory has 
to be something new-the gauge group acts trivially on these observables. In pre- 
vious papers]^ we had suggested that the group Q of automorphisms of a free 
algebra- the non-commutative analogue of the diffcomorphism group- plays this 
crucial role in such a gauge invariant reformulation of matrix models. In this 
paper we finally discover this manifestly gauge invariant formulation of finite 
dimensional matrix models. Wc find that the configuration space of the theory 
is a coset space of Q- justifying our earlier anticipation. 

If we restrict to observables which are invariant under the action of U (N) , we 
should expect that the effective action should contain some kind of entropy. The 
situation is analogous to that in statistical mechanics, with the gauge invariant 
observables playing the role of macroscopic variables. However, there are an 
infinite number of such observables in our case. Moreover, there is no reason to 
expect that the systems we are studying are in thermal equilibrium in any sense. 
The entropy should be the logarithm of the volume of the set of all hermitean 
matrices that yield a given set of values for the t/(A^)-invariant observables. 
This physical idea, motivated by Boltzmann's notions in statistical mechanics, 
allows us to derive an explicit formula for entropy. 

Our approach continues the point of view in the physics literature on random 
matrices ||, |l|, ^, |ll|, ^ ^ ^ . It should not be surprising that our work 
has close relations to the theory of von- Neumann algebras-nowadays called non- 
commutative probability theory:after all operators are just matrices of large 
dimension. Voiculescu has another, quite remarkable, approach to non- 
commutative probability distributions. Our definition in terms of moments and 
the group of automorphisms is closer in spirit to the physics literature. Also, 
the connection of entropy to the cohomology of the automorphism group is not 
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evident in that approach. A closer connection between the mathematical and 
physical literature should enrich both fields. 

Although our primary motivation has been to study toy models of Yang- Mills 
theory, the matrix models we study also arise in some other physical problems. 
There are several recent reviews that establish these connections, so we make 
only some brief comments. See e.g., [|l5| . 

In the language of string theory, what we seek is the action of closed string 
field theory. We solve this problem in a 'toy model'- for strings on a model of 
space-time with a finite number of points. We find that closed string theory is 
a kind of Wess-Zumino-Witten model on the coset spaceQ ; we discover an 
explicit formula for the classical action, including a term which represents an 
anomaly- a nontrivial 1-cocycle. We hope that our work complements the other 
approaches to closed string field theory [ p^ . 

Random matrices also appear in another approach to quantum geometry 
. Our variational method could be useful to approximately solve these matrix 
models for Lorentzian geometry. 

Quantum chaos are often modelled by matrix models In that context 
the focus is often on universal properties that are independent of the particular 
choice of the matrix model action ( which we call S below) . These universal 
properties are thus completely determined by the entropy. Our discovery of an 
explicit formula for entropy should help in deriving such universal properties 
for multi-matrix models: so far results have been mainly about the one-matrix 
model. In the current paper our focus is on the joint probability distribution, 
which is definitely not universal. 

A preliminary version of this paper was presented at the MRST 2001 con- 
ference 0. 

is a, non-commutativo analogue of the diflfeomorphism group; SG is the subgroup that 
preserves a non-commutative analogue of volume. See below for precise definitions. 
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2 Operator-valued Random Variables 



Let ^i, i = 1 • • • Af be a collection of operator valued random variables. We can 
assume withoiit any loss of generality that they are hermitean operators: we 
can always split any operator into its 'real' and 'imaginary parts. If the were 
real-valued, we could have described their joint probability distribution as a 
density ( more generally a measure) on . When are operators that cannot 
be diagonalized simultaneously , this is not a meaningful notion; we must seek 
another definition for the notion of joint probability distribution (jpd ). 

The quantities of physical interest are the expectation values of functions of 
the basic variables ( generators) ^f, the jpd simply provides a rule to calculate 
these expectation values. We will think of these functions as polynomials in the 
generators ( more precisely formal power series.) Thus, each random variable 
will be determined by a collection of tensors u = u"'^,- • •} which are 

the coefficients in its expansion in terms of the generators: 

oo 

m=0 

The constant term is just a complex number: the set of indices on it is empty. 

If u is a polynomial, all except a finite number of the tensors will be zero. It is 
inconvenient to restrict to polynomials: we would not be able to find inverses of 
functions, for example, within the world of polynomials. The opposite extreme 
would be to impose no restriction at all on the tensors u: then the random 
variable is thought of as a formal power series. We pay a price for this: it is no 
longer possible to 'evaluate' the above infinite series for any particular collection 
of operators ^i: the scries may not converge. Nevertheless, it makes sense to 
take linear combinations and to multiply such formal power series: 

m 
n=0 

Note that even if there are an infinite number of non-zero elements in the 
tensors u or v, the sum and product is always given by finite series: there are 
no issues of convergence in their definition. Thus the set of formal power series 
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form an associative ^ algebra; this is the free algebra Tm on the generators . 
Note that the multiphcation is just the direct product of tensors. 

As we noted above, the joint probabihty distribution of the is just a rule 
to calculate the expectation value of an arbitrary function of these generators. 
If we restrict to functions of that are polynomial^, such expectation values 
are determined by the moments 



If the variables commute among each other, these moments are symmetric 
tensors. The most general situation that can arise in physics is that the satisfy 
no relations at all among each other, ( in particular they dont commute) except 
the associativity of multiplication. In this case the moments form tensors with 
no particular symmetry property. All other associative algebras can be obtained 
from this 'free algebra' by imposing relations ( i.e., quotienting by some ideal of 
the free algebra.) Such relations can be expressed as conditions on the moments. 
For example, if — Rij£,k£,i, the moment tensors will satisfy conditions 

involving neighboring indices. 
2.1 The Space of Paths 

Thus, in our theory, a random variable is a tensor u — (u", m*, , • • •). We 
can regard each sequence of indices / = iiZ2 • • • im as a path in the finite set 
1, 2, • • • Af in which the indices take their values; a tensor is then a function on 
this space of paths. Now, given two paths / — ii - ■ - im and J = ji ■ • • jVi, we 
can concatenate them: follow / after traversing J: IJ = ii - ■ ■ imji ' ' ' jn- This 

^ This algebra is commutative only if the number of generators M is one. 

* Not all formal power series may have finite expectation values: the series might diverge. 

This does not need to worry us: there is a sufficiently large family of 'well-behaved' random 

variables, the polynomials. 



8 



concatenation operation is associative but not in general commutative; it has 
the empty path as an identity element. There is however no inverse operation, 
so the concatenation defines the structure of a semi-group on the space of paths. 

We will use upper case latin indices to denote sequences of indices or paths. 
Repeated indices are to be summed as usual; for example 

oo 

^/^G/ = ^u^--''"G,,...,„. (5) 

Also, define dj^^^ to be one if the paths Ii and I2 concatenate to give the path 
/; and zero otherwise. / = imim-i ■ ■ ■ ii denotes the reverse path. Now we 
can see that the direct product on tensors is just the multiplication induced by 
concatenation on the space of paths: 

[uvy ^Sli^^v'^ (6) 



A more refined notion of a path emerges if we regard the indices i as labelling 
the edges of a directed graph; a sequence / — 1112*3 ■ ■ ■ in is a path only when 
ii is incident to 12, and 12 incident with ^3 etc. The space of paths associated 
with a directed graph is still a semigroup; the associative algebra induced by 
concatentations is just the algebra of functions on this semi-group. Lattice 
gauge theory can be interpreted as a matrix model ( of unitary matrices) on a 
directed graph |^ that approximates space-time; for example the cubic lattice. 

The free algebra arises from the graph with one vertex, with every edge 
connecting that vertex to itself; that is why every edge is incident to every 
other edge. This is the case we will mostly consider in this paper. Other cases 
can be developed analogously. 

2.2 Non-commutative Probability Distributions 

We define the 'non-commutative joint probability distribution' of the variables 
to be a collection of tensors Gijj,Gi,Gi-^i2, ■ ■ ■ satisfying the normalization 

^ Directed graphs approximating space-time are called 'lattices' in physics terminology. 
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condition 

1 =< 1 >, (7) 

the hcrmiticity condition 

^iii2 - i™ ~ i.e., G*j = Gj (8) 

as well as the positivity condition 

oo 

*u^iJ2 -j„ > 0, \.e.,Giju'*u^ > 0. (9) 

m,n— 

for any polynomial u(^) — u'^ + u^£,i + u*^'^^iiCi2 ' ' '■ Denote by Vm the space 
of such non-commutative probability distributions. We define the expectation 
value of a polynomial v{£,) = '^„^ ■i;*^*^ ' '"^^^ • • • to be 

m 



If the variables are commutative with joint pdf p{xi, ■ ■ ■ Xn)d^^ x, 

Giii2---i„ = J Xi^---Xi^p{xi,---XM)d^'^x; (11) 

it is clear then that the above conditions on G follow from the usual nor- 
malization, hermiticity and positivity conditions on p{x)d'^x. For example, the 
contravariant tensors ,u^, u^^^^ define a polynomial 

m 

and the quantity on the Ihs of the positivity condition above is just the ex- 
pectation value of u\(_)u{(_). In this case, the moment tensors will be real and 
symmetric. Upto some technical assumptions, the pdf p{x)d'^^ x is determined 
uniquely by the collection of moments Gi-^...i^. ( In the case of a single vari- 
able, the reconstruction of the pdf from the moments is the 'moment problem' 
of classical analysis; it was solved at varying levels of generality by Markov, 
Chebycheff etc. See the excellent book by Akhiezer p^.) 
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In the non-commutative case, the pdf no longer makes sense- even then the 
moments allow us to calculate expectation values of arbitrary polynomials. This 
motivates our definition. 

In the cases of interest to us in this paper, Gi is cyclically symmetric. This 
corresponds to closed string theory or to glueball states of QCD. Open strings, or 
mesons would require us to study moments which are not cyclically symmetric. 
The theory adapts with small changes; but we wont discuss these cases here. 



3 Large N Matrix Models 

The basic examples of such operator-valued random variables are random ma- 
trices of large size. 

A matrix rn.odel is a theory of random variables which are N x N hermitean 
matrices Ai, i = 1, • • • M.The matrix elements of each of the Ai are complex- 
valued random variables, with the joint probability density function on R^^^ 

_L_giVtrS(A)^JV=M^ (13) 

where S{A) = S'*^*^ "*" • • • Ai^ is a polynomial called the action. Also, 

Z[S) is determined by the normalization condition: 

Z{S) = J e^'^'^^^U'^'^A. (14) 

The expectation value of any function of the random variables is defined to be 

The tensors 5*^ may be chosen to be cyclically symmetric. We assume they 
are such that the integrals above converge: S{A) — » — oo as \A\ — > oo. The 
interesting observables are invariant under changes of bases. We can regard 
the indices i = 1, • • • M as labelling the links in some graph; then a sequence 
if ■■in is a path in this graph. Then $jj...j^(A) = -i: tr [Ai^ ■ ■ ■ AiJ\ is a 
random variable depending on a loop in this graph. For the moment we will 
consider every link in the graph to be incident with every other link, so that all 
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sequences ii - ■ - in are allowed loops. In this case the loop variables are invariant 
under simultaneous changes of basis in the basic variables: 

A^gAg^ geUiN). (16) 

If we choose some other graph, a sequence of indices is a closed loop only if the 
edge i2 is adjacent to ii, 13 is adjacent to i2 and so on. The invariance group 
will be larger; as a result, is non-zero only for closed loops /. 

Given S, the moments of the loop variables satisfy the Schwinger-Dyson 
equations 

S'^''-'^<^j,ij, ) + d'j'''^ ) = 0. (17) 



This equation can be derived by considering the infinitesimal change of vari- 
ables 

[d,A]i = v(Ai (18) 

on the integral 

Z{S) = J e^''^<-^^dA. (19) 

The variation of a product of A's is easy to compute: 

[S,A]j = vi6'/''Aj,AiAj,. (20) 

The first term in the Schwinger-Dyson equation follows from the variation of 
the action under this change. The second term is more subtle as it is the change 
of the measure of integration-the divergence of the vector field Vi{A): 

^v{dA)=vl^^dA, ^ = ^;-^nr^,,trA,,. (21) 

Returning to the Schwinger-Dyson equations, we see that they are not a 
closed system of equations: the expectation value of the loop variables is related 



12 



to that of the product of two loop variables. However, there is a remarkable 
simplification as ^ cxd. 

In the planar limit N oo keeping S'*^ "'" fixed, the loop variables have no 
fluctuations^: 

< /i($)/2(*) >=< X /2(<i>) > (22) 

where /i($), /2(<i>) are polynomials of the loop variables. This means that 
the probability distribution of the loop variables is entirely determined by the 
expectation values {moments) 

G.,...,;„ = lim < 4 tr A,, • • • > . (23) 



Thus we get the factorized Schwinger-Dyson equations: 

S''''-G.j,ij, + 8Y'''-Gi,Gi, = 0. (24) 

Since the fluctuations in the loop variables vanishes in the planar limit, there 
must be some effective 'classical theory' for these variables of which these are 
the equations of motion. We now seek a variational principle from which these 
equations follow. 

Matrix models arise as toy models of Yang-Mills theory as well as string 
theory. The cyclically symmetric indices / = ii ■ ■ ■ in should be interpreted as 
a closed curve in space-time. The observable $/ correspong to the Wilson loop 
in Yang-Mills theory and to the closed string field. 

To summarize, the most important examples of non-commutative probability 
distributions are large N matrix models: 

^ This is called the planar limit, since in perturbation theory, only Feynman diagrams of 
planar topology contribute In the matrix model of random surface theory, one is interested 
in another large TV limit, the double scaling limit. Here the coupling constants have to 
vary as A'^ — + oo and tend to certain critical values at a specified rate. The fluctuations are 
not small in the double scaling limit. 
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3.1 Example: the Wigner Distribution 

The most ubiquituous of all classical probability distributions is the Gaussian; 
the non-commutative analogue of this is the Wigner distribution ||^ ( also called 
the semi-circular distribution). 

We begin with the simplest where we have just one generator ^ for our 
algebra of random variables. The algebra of random variables is then necessarily 
commutative and can be identified with the algebra of formal power series in 
one variable. The simplest example of a matrix-valued random variable is this: 
^ is an iV X hermitean matrix whose entries are mutually independent random 
variables of zero mean and unit variance. More precisely, the average of ^" is, 

< >^ /■ 1 tr Ce-^ i'if-A, (26) 



N " Zn 
The normalization constant is chosen such that < 1 >= 1. 

The Wigner distribution with unit covariance is the limit as ^ oo: 

r„= hm /ltrre-*'^«^«^. (27) 

The factorized Schwinger-Dyson equations reduce to the following recursion 
relations for the moments: 

Tfe+l = ^ ^m^n (28) 

m-i-n—k— 1 

Clearly the odd moments vanish and Fq = 1. Set T2k = Ck- Then 

Ck+l = CmCn- (29) 

ni+n—k 

The solution of the recursion relations give the moments in terms of the Catalan 
numbers 

r.. = c.^^ J. (30) 
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3.2 The Multivciriable Wigner Distribution 

Let K^^ be a positive matrix; i.e., such that 

K'^u*Uj > (31) 

for all vectors with zero occuring only for u = 0. Then the moments of the 
Wigner distribution on the generators = 1,2,---M are given by 

/-I jAf^M/r 
^tr Kn---^,Je-*^-'*^«'«^V^ (32) 

( Again, is chosen such that < 1 >= 1.) It is obvious that the moments of 
odd order vanish; also, that the second moment is 

Tii = K-^'i. (33) 

The higher order moments are given by the recursion relations: 

ri/ = ri,^;^^'^r/,r/,. (34) 

Note that each term on the rhs corresponds to a partition of the original path 
I into subsequences that preserve the order. By repeated application of the 
recursion the rhs can be written in terms of a sum over all such 'non-crossing 
partitions' into pairs. The Catalan number Ck is simply the number of such 
non-crossing partitions into pairs of a sequence of length 2k. 

Our stategy for studying more general probability distributions will be to 
transform them to the Wigner distribution by a nonlinear change of variables. 
Hence the group of such transformations is of importance to us. In the next 
section we will study this group. 



4 Automorphisms of the Free algebra 

The free algebra generated by remains unchanged ( is isomorphic) if we change 
to a new set of generators, 

oo 

(35) 

m=0 
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provided that this transformation is invertible . We will often abbreviate ^/ = 
^ii ■ ■ ■ ^im so that the above equation would be = <Pi^i- The composition 
of two transformations tjj and (p can be seen to be, 

n<\K\ 

Note that the composition involves only finite series even when each of the 
series ^ and is infinite. 

The inverse, (<A~^)i(0 = Xi(C)) is determined by the conditions 

[(X o <^)i]^ = '^f,...P„xf -^"C • • • = ^f- (37) 
They can be solved recursively for Xj- 

X) = 

Xjlj2 ~ 

4.-jn = -EC.■■^x)^••xtxi,...^<^k•••'/'k.• (38) 

m<n 

Thus an automorphism </> has an inverse as a formal power series if and only if 
the linear term has an inverse; i.e, if the determinant of the matrix is 
non-zero. The set of such automorphisms form a group Qm = Aut Tm- This 
group plays a crucial role in our theory. 

4.1 Transformation of Moments under Qm 

Given an automorphism and a probabihty distribution with moments Gj 

[m]i = <t^Ui (39) 

the expectation value of [^(0]ii * * * [^(Oli^- 

oo 

[4>.G\j = Y,4>i:---4>tGj,...j^ (40) 



^ji -^^2 -^h T^kik2 
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We might regard these as the moments of some new probabihty distribution. 
There is a technical problem however: the sums may not converge, the group 
Gm of formal power series includes many transformations that may not map 
positive tensors to positive tensors; they may not preserve the 'measure class' 
of the joint probability distribution G/. 

Given some fixed jpd ( say the Wigner distribution with unit covariance), 
there is a subgroup Gm that maps it to probability distributions; this is the 
open subset of G m defined by the inequalitie^ 

Gm = {0 e Gm\ [Mi^i^ u*''u'^ > 0, } . (41) 

for all polynomials u. Thus in the neighborhood of the identity the two groups 
Gm and Gm are the same; in particular, they have the same Lie algebra. The 
point is that Gm and Gm are Lie groups under different topologies: the series 
S Gm have to satisfy convergence conditions implied by the above inequalities. 

It is plausible that any probability distribution can be obtained from a fixed 
one by some automorphism; indeed there should be many such automorphisms. 
As a simple example, the Wigner distribution with covariance Gij can be ob- 
tained from the one with covariance 5^^ by the linear automorphism pro- 
vided that = Efc '/'i^'^j -Thus the space of Wigner distributions ( the space 
of positive covariance matrices) is the coset space GLm/Om- 

In the same spirit, we will regard the space of all probability distributions 
as the coset space of the group of automorphisms Gm/SGm, where SGm is the 
subgroup of automorphisms that leave the Wigner distribution of unit covariance 
invariant. We can parametrize an aribitary distribution with moments G/ by 
the transformation that relates it to the unit Wigner distribution with moments 
L/: 

n=l 

Indeed, we will see below that all moments that differ infinitesimally from 
a given one are obtained by such infinitesimal transformations. To rigorously 

^ / denotes the reverse of the sequence: / = 1112 ■ ■ ■ in, I = inin—l ■ ■ ■ ii 
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justify our point of view, we must prove that ( in an appropriate topology) the 
Lie group Q is the exponential of this Lie algebra. We will not address this 
somewhat technical issue in this paper. In the next section we describe the Lie 
algebra in some more detail. 

4.2 The Lie algebra of Derivations 

An automorphism that differs only infinitesimally from the identity is a deriva- 
tion of the tensor algebra. Any derivation of the free algebra is determined by 
its effect on the generators ^j. They can be written as linear combinations L\, 
where the basis elements L\ are defined by 

[m,=6}ii. (43) 
The derivations form a Lie algebra with commutation relations 

m,L^=6'i^^''^L^^,j^-6',^^'^L}^j,^. (44) 

The change of the moments under such a derivations is : 

[L\G]j = 6f'-'^Gj,u,. (45) 

Wc already encountered these infinitesimal variations in the derivation of the 
Schwinger Dyson equation. 

Let us consider the iiifiuitisemal neighborhood of some reference distribution 
Tj for example the unit Wigner distribution. We will assume that F/ satisfies 
the strict positivity condition, Tjju*^v:^ > 0; i.e., that this quadratic form 
vanishes only when the polynomial u is identically zero. (This condition is 
satisfied by the unit Wigner distribution.) It is the analogue of the condition 
in classical probability theory that the probability distribution does not vanish 
in some neighborhood of the origin. Then, the Hankel matrix Hj.j = Gjj is 
invertible on polynomials: it is an inner product. 

The infinitesimal change of moments under a derivation v = v(L} is 

[i„r]^^...j(.^ = vl^Tik^-kn + cyclic permutations in (fci • • • fc„). (46) 
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Now, it is clear that the addition of an arbitrary infinitesimal cyclically symmet- 
ric tensor gi to F/ can be achieved by some derivation: we just find some tensor 
Wi of which gi is the cyclically symmetric part and put Wkki—kn = v^^jk- 
Since the Hankel matrix is invertible, we can always find such a v. Thus an 
arbitrary infinitesimal change in F/ can be achieved by some t;f . 

Indeed there will be many such derivations, diflfering by those that leave F/ 
invariant. The isotropy Lie algebra of F/ is defined by 

^ki^ ik^ - k„ + cyclic permutations in (fci • • • = 0. (47) 

We can simplify this condition for the choice where F/ is the Wigner distr- 
bution. For n = 1 this is just 

viVi = 0; (48) 
for n = 2, we get, using the recursion relation for Wigner moments, 

rfc^.T/Fi + fci ^ A:2 = 0. (49) 
In general, using the iterations of the Wigner recursion relation 

T/feifea =^k-Lj2^k2h^Y'^^^'^^^^lJ^I'2^l3 (50) 

etc., we get 

rfcun-.rfe,,„_, • • • rfe„_,,,<^^^^^--^"-^-F,, • • • f,„ (51) 

+ cyclic permutations in (fci • • • fc„) = 0. (52) 

In other words, we should lower a certain number of indices on vf using the 
second moment and contract away the rcst;thc rcsiilting tensor should not have 
a cyclically symmetric part. It would be interesting to find the solutions of these 
conditions more explictly. We will not need this for our present work. 



19 



5 The Action Principle and Cohomology 

We seek an action n{G) such that its variation under an infinitesimal automor- 
phism [LjG]^ — Sj^^'^'^Gjj^ij^ is the factorized SD equation: 

Lin(G) = S'^^'^^Gj^ij, + S'j'^'^Gi.Gi,. (53) 

It is easy to identify a quantity that will give the first term: 

L}[S'Gj]^S'''''-Gj,ij,. (54) 

This term is simply the expectation value of the matrix model action. So 

n{G) = S-^Gj + x{G) (55) 

with 

L\x{G)=^i^5\^'''Gi,Gi,. (56) 

This term arises from the change in the measure of integration over matrices; 
hence it is a kind of 'anomaly'. 

Now, in order for such a function x(G) to exist, the anomaly 77} must satisfy 
an integrability condition L'j{Ljx) — L^j{L]x) = [L},L^j]x^ i-^-, 

iWS - L^jV} + S'/'''vIji. = 0. (57) 

A straightforward but tedious calculation shows that this is indeed satisfied. We 
were not able to find a formal power series of moments satisfying this condition 
even after many attempts. 

Then we realized that, even in the case of a single matrix (treated in the 
appendix) there is no solution of the above equation! The condition above is 
in fact the statement that 77} (G) is a one-cocycle of the Lie-algebra cohomology 
of G J^.J valued in the space of formal power series in G. (See the appendix |pO|. 
) Although 77 itself is a quadratic polynomial in the G, there is no polynomial 
or even formal power series of which it is a variation: it repesents a nontrivial 
element of the cohomology of G_ twisted by its representation on the space of 
formal power series in the moments. 



20 



We need to look for x in some larger class of functions on the space Vm 
of probability distributions. Now, Vm = Gm/SGai, a coset space of the group 
of automorphisms. We can parametrize the moments in terms of the automor- 
phism that will bring them to a standard one: G/ — [(j)^T]j. So, another way 
of thinking of functions on Vm would be as functions on Gm invariant under 
the action^ of SGm- Thus, instead of power series in G/, we will have power 
series in the coefficients (pl determining an automorphism. In order to stand in 
for a function on Vm , such a power series would have to be invariant under the 
subgroup SGm- 

Clearly, any power series of G/ can be expressed as a power series of the 0f : 
simply substitute [i/'^r]/ for G/. But there could be a power series in (j) that 
is invariant under SG and still is not expressible as a power series in G. This 
subtle distinction is the origin of the cohomology we are discussing^ We can 
now guess that the quantity we seek is a function of this type on 

A hint is also provided by the origin of the term rfj{G) in the Schwinger- 
Dyson equations. It arises from the change of the measure of integration over 
matrices under an infinitesimal, but nonlinear, change of variables. Thus, it 
should be useful to study this change of measure under a finite nonlinear change 
of variables-an automorphism. 

More precisley, let 4i{A) be a nonlinear transformation Ai ^ 4>{A)i = 
Y^°^=i4>iAi, on the space of hermitean N x N matrices. Also, let a{(p,A) = 

logdet J((/), A), where J((/), A) is the Jacobian determinant of 0. 

By the multiplicative property of Jacobians, we have 

a((/)i02,^) =CT(</'i,</'2(A))+a((/)2,A). (58) 



For example, a{<j),A) = logdet(/)o if [^(2;)]^ = (/"oiCj is a linear transforma- 
tion: the Jacobian matrix is then a constant. It is convenient to factor out this 

*Such an idea was used succesfuUy to solve a similar problem on cohomologies pi] . 
^We give a simple example in the appendix. 
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linear transformation and write 

oo 



n=2 



We will show in the appendix that a{<j), A) can be written in terms of the 
traces = tr 

a{(j),A) = logdet(^o+ 

n=l 



Thus, the expectation value of cr{(j), A) with respect to some distribution can 
be expressed in terms of its moments Gj =< >, in the large A'' limit: 

<<7{ct>,A)> = c{ct>,G) 

= logdet^i + 

n=l 

The above equation for a{(j)i<p2,A) then shows that the expectation value 
c{(j), G) satisfies the cocycle condition: 

c(.^i</.2, G) = c{cPu(l>2*{G)) + c{cP2, G). (60) 

Moreover, if we now restrict to the case of infinitesimal transformations, (/>(^)i = 
+ this c{<j), G) reduces to r/: 

c{<t>,G)=vl4{G) + 0{v^). (61) 



Let lis look at it another way: let G ~ (/)*r for some reference probability 
distribution T. Then the cocycle condition gives 

c{cj>^,G) = c{4>i<l),T) - c{<l>,T). (62) 

Choosing 0i to be infinitesimal gives then, 

r,l{G) = Llc{<i>,T). (63) 
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Thus, we have solved our problem!: x{4') — c('/>, F) is a function on Q]\j whose 
variation is 77. But is it really a function on T'm?- In other words, is x(0) 
invariant under the right action of SQl 

If (j)2* r = r the cocycle condition reduces to 

c(002,r) = c(0,r) + c(02,r). (64) 

We need to show that the last term is zero. 

We will only consider the case where the reference distribution F is the unit 
wignerian. If 4'2{C)i = + "^1^1 is infinitesimal, c(02,r) is just vlrfjiT) = 
v^^^TJ^L■ But, since v must leave the Wigner moments F/ unchanged, it must 
satisfy (^). If we contract that equation by r'^^'^^ we win get v^'^TjTl = 0. 
Thus xi4') in invariant at least under an infinitesimal 02 G SQ. Within the ul- 
trametric topology of formal power series, the group SQ should be connected, so 
that any element can be reached by a succession of infinitesimal transformations. 

To summarize, we now have an action principle for matrix models, 

oo 

+ logdet</)^^. + ^ L^^^^inL,^K...L. . . . ^^^.^.i.r,,„...^^ri^...^„. 

n=l 

The factorized Schwinger-Dyson equations follow from requiring that this ac- 
tion be extremal under infinitesimal variations of (f>. By choosing an ansatz for 
(j) that depends only on a few parameters and maximizing Q with respect to 
them, we can get approximate solutions to the factorized SD equations. 

6 Entropy of Non-commutative Probability Dis- 
tributions 

Whenever we restrict the set of allowed observables of a system, some entropy 
is created: it measures our ignorance of the variables we are not allowed to 
measure. Familiar examples arise from thermodynamics, where only a finite 
number of macroscopic parameters are measured. In blackhole physics where 
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only the charge, mass and angular momentum of a blackhole can be measured 
by external particle scattering: the interior of a blackhole is not observable to 
an outside observer. 

There should be a similar entropy in the theory of strong interactions due 
to confinement: only 'macroscopic' observables associated to hadrons are mea- 
surable by a scattering of hadrons against each other. Quarks and gluons are 
observable only in this indirect way. More precisely, only color invariant observ- 
ables are measurable. 

In this paper, we have a toy model of this entropy due to confinement of 
gluons: we restrict to the gauge invariant functions = ti Ai, of the 
matrices Ai - ■ ■ Am- It turns out that the term x in the action principle above 
is just the entropy caused by this restriction. 

Let Q be some space of 'microscopic' variables with a probability measure 
fj,, and ^ : Q ^ Q some map to a space of 'macroscopic' variables. We can now 
define the volume of any subset of Q to be the volume of its preimage in Q: this 
is the induced measure p. on Q. 

In particular we can consider the volume of of the pre- image of a point q G Q. 
It is a measure of our ignorance of the microscopic variables, when q is the result 
of measuring the macroscopic ones. Any monotonic function of this volume is 
just as good a measure of this ignorance. The best choice is the logarithm of this 
volume, since then it would be additive for statistically independent systems. 
Let us denote this function on ^{Q) by 

= log 1(g)]). (65) 

The average of this quantity over Q is the entropy of the induced probability 
distribution p.. 

Let us apply this idea to the case where the 'microscopic' observable is a 
single hermitean N x N matrix A; the 'macroscopic' observable is the spectrum, 
the set of eignvalues. We disregard the information in the basis in which A is 
presented. We do so even if this information is measurable in principle; e.g., by 
interference experiments in quantum mechanics. The natural measure on the 
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space of matrices is the uniform (Lebesgue) measure dA on R'^^ . Althogh the 
uniform measure is not normahzable, the volume of the space of matrices with a 
given spectrum {oi, • • • ajv} is finite [ p^ . Upto a constant ( i.e., depending only 
on N), it is ni<i<i<jv(«j ^ Thus the entropy is 2 J^_^^ p{x)p{y) log \x - 

y\dxdy where p{x) — X^iLi ^{^ ~ '^O- This expression make sense even in the 
limit iV — !■ oo: we get a continuous distribution of eigenvalues p(x). 

What is the 'joint spectrum' of a collection of M hermitean matrices A\ - ■ ■ Am"'- 
Clearly they cannot be simultaneously diagonalized, so a direct definition of this 
concept is impossible. Now, recall that the set of eigenvalues {oi • • -aAr} can 
be recovered from the elementary symmetric functions Gn — jf "^iLi '^^ the 
solutions of the algebraic equation 

= Gix^-i - G^x"-^ + • • • {-If-^GN- (66) 

The moments Gn for n > N are not independent: they can be expressed as 
polynomials of the lower ones. Although the set {ai ■ ■ -ajv} is determined by 
the sequence Gi , • • • Gn , there is no explicit algebraic formula: Galois theory 
shows that this is impossible for iV > 4. Galois theory also shows that any gauge 
invariant polynomial of A can be expressed as a polynomial of the Gi , • • • Gn ■ 
Thus we can regard this sequence tr ^, -i- tr • • • as the spectrum of the 
matrix A. 

The volume ni<j('^« ^ "^i)^ space of matrices with a given spectrum is 

a symmetric polynomial of order ^'^^^~^'> in the eigenvalues. Hence, in principle, 
it can be expressed as a polynomial in Gi , • • • Gn , athough there does not appear 
to be a simple universal formula]^. 

This point of view suggests a generalization to several matrices: we can define 
the joint spectrum of a collection of matrices to be the quantities Gi-^...i^ = 
jj tr Ai^ ■ ■ ■ Ai^ . Again, there are relations among these quantities when / 

^"It is possible to get a formula for the volume in terms of the first 2N moments. The 
complication is that only the first N moments can be freely specified. The remaining moments 
are determined by these , and yet, there is no algebraic formula that expresses Gjv+i ■ • • G2N 
in terms of Gi • • ■ Gjv. 
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is longer than N; but it is difficult to characterize these relations explicitly. 
Nevertheless, it is meaningful to ask for the volume (with respect to the uniform 
measure on i?*^^ ) of the set of all matrices with a given value for the sequence 
Gi-^...i^^. Again, we will not get any explicit formula for entropy by pursuing this 
point of view. 

So we look for yet another way to think of the joint spectrum of a collection 
of matrices. We can ask how the entropy of a collection of matrices with joint 
spectrum G/ changes if we transform them by some power series: 

A, ^ 0(A), = ^(Ai. (67) 

Let c{4', G) be this change. Then, if we perform another transformation, we 
must have 

c(#2,G) =c(0,(/.2*G)+c(</)2,G); (68) 

i.e., it must be a 1-cocyle. Under infinitesimal variations, it reduces to 77, since 
it is just the infinitesimal change in the uniform measure dA. 

In the last section we obtained this c(0, G) explicitly as a formal power series 
in G. It can be written as the variation 

c(0,G) =x(0*(G))-x(G) (69) 

of some function x of the joint spectrum G. However this x is not a formal 
power series in G, so we cannot get an explicit formula for it. We can write it as 
an explicit formal power series in (j) which is invariant under the action of SQ. 

Thus we see the confiuence of three apparently unrelated questions: an action 
principle for the planar limit of matrix models ( our main interest), cohomology 
of the automorphism of formal power series and entropy of non-commutative 
variables. 

Voiculsecu has a somewhat different approach |Q to defining the entropy 
of noncommuting random variables. Upto some additive constant his definition 
seems to agree with ours. 
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7 Example: Two-Matrix Models 



Let us consider a quartic multi-matrix model with action 



S{M) ^ - tr [W'A, + y^^'MM- (70) 



Our reference action is the gaussian |^ S'o(Af) = — tr ^8"^^ AiAj. We are in- 
terested in estimating the greens functions and vacuum energy in the large N 
limit: 



^cxact ^ _ l^log^ (71) 

where Z and Zq are partition functions for S and 5*0. Choose the linear change 
of variable Ai — s- (j)i{A) = (t>lAj. The variational matrix (pl that maximizes 
determines the multi-variable Wigner distribution that best approximates the 
quartic matrix model. For a linear change of variables, 



m = tr \ogm - \k''G,, - lg'^''G,,u- (72) 



Here Gy — (j)i(t>'j and Gijki — GijGki + GuGjk are the greens functions of 
So{(j)~'^{A)). Thus, the matrix elements of G may be regarded as the variational 
parameters and the condition for an extremum is 



Ikp" + ^[gP'^'^^Gki + ff'^P'Gy + 5"«'G,fc + g'^'^'Gu] = ^IG'T (73) 



^^In the language of non-commutative probability theory we used earlier, what we call the 
gaussian in this section is really the multivariate wignerian distribution. There should be no 
confusion, since the wignerian moments are realized by a gaussian distribution of matrices. 
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This is a non-linear equation for the variational matrix G, reminiscent of the 
self consistent equation for a mean field. To test our variational approach, we 
specialize to a two matrix model for which some exact results are known from 
the work of Mehta (24). 

7.1 Mehta's Quartic Two-Matrix Model 

Consider the action 



S{A,B) ^ - tr [^{ + B'^ - cAB-cB A) + ^{A'^ + B'^)]. (74) 

/ 1 -c\ 

which corresponds to the choices K'^ = , g^^^^ = 5^222 _ ^ 

gijki _ Q otherwise. |^ We restrict to |c| < 1, where K'^^ is a positive matrix. 
Since S{A, B) = S{B, A) and Gab = we may take 



G^, = ( . I (75) 

with a, [3 real. For g > 0, f7 is bounded above if Gij is positive. Its maximum 
occurs at (a, (3) determined by /3 = i+2ga 



4,g^a^ + Aga^ + {l-c^ -2g)a-l^Q. (76) 

We must pick the real root a{g^c) that lies in the physical region a > 0. Thus, 
the gaussian ansatz determines the vacuum energy {E{g,c) = — ^ log (a^ — /3^)) 
and all the greens functions (e.g. Gaa — ol-, Gab ~ Ga-^ — 2a^ e.t.c) 
approximately. 

By contrast, only a few observables of this model have been calculated ex- 
actly. Mehta j2^ |^ obtains the exact vacuum energy E'^^{g,c) implicitly, as 

^^Kazakov relates this model to the Ising model on the collection of all planar lattices with 

coordination number four [p5[]. 

^''Some other special classes of greens functions are also accessible (see Q). 
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the solution of a quintic equation. and may be obtained by differen- 
tiation. As an illustration, we compare with Mehta's results in the weak and 
strong coupling regions: 



G'TBig,^ 



^AAAAig^ 



(5,2 

G7B{g,l 

Gvar ^ 
AAAA\g^ o 



G7b{9,c 
E-'^^{g,c 



GTBig.c 



GT'{g,c 



-.144+ 1.78g- 8.74g2 

^ - 4.745 + 53.3302 ^ 
32 

y-34.965 + --- 



= -.144 + 3.565- 23.752 + -- 

= ^ -4.745 + 48.4652 + --- 
32 

= y -3I.6I5+ 368.0252 + .. 



1, 1, o 3 

-log5 + -log3 - - + • 

as 5 —> 00 
1 

- + •••. 

g 



e.t.c. 



^log5 + ^log2 



0{-) 



c 
1 



2 1 

3- + O(^), e.t.c. 

(25)i ^g^'' 



(77) 



We see that the gaussian variational ansatz provides a reasonable first ap- 
proximation in both the weak and strong coupling regions. The gaussian varia- 
tional ansatz is not good near singularities of the free energy (phase transitions). 
As |c| 1~, the energy E''^ diverges; this is not captured well by the gaus- 
sian ansatz. This reinforces our view that the gaussian variational ansatz is the 
analgoue of mean field theory. 
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7.2 Two-Matrix Model with tr [A, B]"^ Interaction 

The power of our variational methods is their generahty. We present an approx- 
imate solution to a two matrix model, for which we could find no exact results 
in the literature. The action we consider is: 



2 

S{A,B) = -tr [!!^(A^+B^) + ^{AB + BA)-^[A,B]% (78) 



This is a caricature of the Yang-Mills action. A super-symmetric version of 
this model is also of interest (see [^). Consider the regime where K''^ = 

is positive, k = {m* — c^) > 0. As before, we pick a gaussian 



c 



c 



ansatz and maximize fl. We get B = — and 



r<AA r^BB „ '"■ r /i _L ii 



E = l,„,[H£±i_Vp±il,. (79) 

2 2g^ 



All other mean field greens functions can be expressed in terms of a. 

It is possible to improve on this gaussian variational ansatz by using non- 
linear transformations. It is much easier to find first a gaussian approximation 
and then expand around it in a sort of loop expansion. This is the analogue of 
the usual Goldstone methods of many body theory. We have performed such 
calculations for these multimatrix models, but we will not report on them in 
this paper for the sake of brevity. The results are qualitatively the same. In 
the next section ( an appendix) we will give the departures from the gaussian 
ansatz in the case of the single-matrix models. 
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8 Appendix: Group cohomology 



Given a group G and a G- module V (i.e., a representation of G on a vector 
space V), we can define a cohomology theory||20|]. The r-cochains are functions 

/ : G"^ ^ y. (80) 

The coboundary is 
df{gi,92,---9r+i) = 51/(32, •• -gr+i) 

r 

+ ^{-^yf{9i,g2, ■ ■ ■ 9s-i,gs9s+i,9s+2, ■ --gr+i) 

+ (-!)'+ V(i?ir--, 5.)- (81) 

It is straightforward to check that cPf = for all /. A chain c is a cocycle or is 
closed if df — 0; a cocycle is exact or is a coboundary if 6 = rf/ for some /; The 
rth cohomology of G twisted by the module V, H^{G, V) is the space of closed 
chains modulo exact chains. H'^{G, V) is the space of invariant elements in V; 
i.e., the space of v satisfying gv — v — for all g G G. A 1-cocycle is a function 
c: G ^ V satisfying 

c(ffi52) = gic{g2) + c{gi). (82) 

Solutions to this equation modulo 1-coboundaries (which are of the form b(g) = 
{g — l)v for some v E V) is the first cohomology H^{G, V). If G acts trivially 
on a cocycle is just a homomorphism of G to the additive group of V: 
c(ffi52) = 0(52) +c(5i). 

A 1-cocycle gives a way of turning a representation on V into an afRne 
action: 

{g,v)'-^ gv + c{g). (83) 

If c{g) is a coboundary (i.e., b{g) = {g — l)u for some u), this afRne action is 
really a linear representation in disguise: if the origin is shifted by u we can 
reduce it to a linear representation. Thus the elements of H^{G,V) describe 
'true' afRne actions on V. For example let G be the loop group of a Lie group 



31 



G", the space of smooth functions from the circle to G": G = S^G' = {g : ~* 
G'}. Let V — S^G[ be the corresponding loop of the Lie algebra of G'. Then 
there is an obvious adjoint representation of G on P^; a non-trivial 1-cocycle is 
c{g) — gdg~^, d being the exterior derivative on the circle: 

c(5i52) = 9i[92d{g2^)]g^^ + gidg^^ = ad 510(32) + c{gi). (84) 



9 Appendix: A Single Random Matrix 

In the special case where there is only one matrix (M = 1), there is a probability 
distribution on the real line p{x)dx such that 



G„ = / x^p{x)dx 



(85) 



This follows because the G„ satisfy the positivity condition 



^ ^ Gn+mUjyiUn > 0; 
m,n— 



(86) 



upto technical conditions, any sequence of real numbers satisfying this condition 
determine a probability distribution on the real line. ( This is the classical 
moment problem solved in the nineteenth century JlSt.) There is an advantage 
to transforming the factorized SD equations into an equation for p{x)-it becomes 
a linear integral equation, the Mehta-Dyson equation ]lO|: 

p{y)dy 



2V 



x~y 



+ S'{x) = 0. 



(87) 



Moreover, the solution|22, ^ to this equation can be expressed in purely 
algebraic termsQ 



p{x) = -:^0{a <x< b)y/[{x -a)(b- x)] 

ZTT 



S'{x) 



^ix~a){x-b)\- ^^^^ 
XfcZ* around infinity, [X(z)J = y^^„ Xjcz'' 



For a Laurent series X{z) = 
denotes the part that is a polynomial in z. This is analogous to the 'integer part' of a real 
number, which explains the notation. 
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The numbers a and b are solutions of the algebraic equations 

r,5=0 

J2[r + s]SrJ-^^a^b^ = 2 (89) 
' rlsl 

r,s=0 

where (5)^ = 5(5 + 1) • • • (| + r — 1). The simplest example is the case of the 
Wigner distribution. It is the analogue of the Gaussian in the world of non- 
commutative probability distributions. For, if we choose the matrix elements of 
A to be independendent Gaussians, S{A) = — i tr A"^, we get the distribution 
function for the eigenvalues of A to be ( in the large N Hmit) 

Po{x) = ^y^[4-x^]e{\x\<2). (90) 

The odd moments vanish; the even moments are then given by the Catalan 
numbers 

1 /2fc\ 

G..=C.— ( . (91) 



The Mehta-Dyson equation follows from maximizing the 'action' 

^(P) = J p{x)S{x)dx + V j\og\x- y\p{x)p{y)dxdy (92) 

with respect to p(x). Then generating function log^(S') is the maximum of 
this functional over all probability distributions p. The physical meaning of the 
first term is clear: it is just the expectation value of the action of the original 
matrix model: 

I p{x)S{x)dx = Y,GnSn- (93) 

The second term can be thought of as the 'entropy' which arises because we 
have lost the information about the angular variables in the matrix variable: the 
function p{x) is the density of the distribution of the eigenvalues of A. Indeed, 
'^i^j k» ~~ ^3 1 (upto a constant depending only on N) the log of the volume 
of the space of all hermitean matrices with spectrum ai, 02 • • • ajv-The entropy 
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V J log \x — y\p{x)p{y)dxdy is the large A'' limit of this quantity. Note that the 
entropy is independent of the choice of the matrix model action: it is a universal 
property of all one-matrix models. The meaning of the variational principle is 
now clear: we seek the probability distribution of maximum entropy that has a 
given set of moments Gr for r = 1, • • • n. The coefficients of the polynomial S 
are just the Lagrange multipliers that enforce this condition. Thus we found 
a variational principle, but indirectly in terms of the function p{x) rather than 
the moments G„ themselves. The entropy could not be expressed explicitly in 
terms of the moments. Indeed, in a sense, this is impossible: 

The entropy cannot be expressed as a formal power series in G„. This 
is surprising since there appears to be a linear relation between p{x) and G„, 
since G„ = / x"'p{x)dx; also the entropy is a quadratic function of p{x). So one 
might think that entropy is a quadratic function of the G„ as well. But if we try 
to compute this function we will get a divergent answer. Indeed, we claim that 
even if we dont require the series to converge, the entropy cannot be expressed 
as a power series in G„. 

By thinking in terms of the change of variables that bring the probability 
distribution we seek to a standard one, we can find an explicit formula for 
entropy. Since we are interested in polynomial actions S{A), which are 
modifications of the quadratic action ^A^, the right choice of this reference 
distribution is the Wigner distribution 



Po{x) = -V[^-x']e{\x\<2). 



(94) 



There should thus be a change of variable 0(a;) such that 




(95) 



in other words, 



p{(l){x))(l)' {x) = poix). 



(96) 



Then we get 




(97) 
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We have dropped a constant term-the entropy of the reference distribution 
itself. Also it will be convenient to choose the constant of integration such that 
(p{0) = 0. Now we can regard the diffeomorphism as parametrized by its Taylor 
coefficients 

oo 

<^(a;) = > 0. (98) 

n=l 

Although we cannot express the entropy in terms of the moments G„ them- 
selves, we will be able to express both the entropy and the moments in terms of 
the parameters Thus we have a 'parametric form' of the variational prob- 
lem. It is this parametric form that we can extend to the case of multi-matrix 
models. Indeed, 

/(x) = ^x" ^h---<l>i. (99) 

n=l li+l2 + ---lk=n 

so that 

oo 

Gk = J2Tn Yl (100) 

n=l li+l2 + ---lk=n 



It is convenient to factor out the linear transformation (/)(a;) = (pilx 



^(a;)], where ^(x) = J2k^2 ^i'kx'', with (j)k = 0;c[(/)i] ^Then 



log 



(j){x) - (j>[y) 
x-y 



= log 4>i + loe 



1 + E 



j.ni_yTn 



X-y 



Using 



_ ym 

X-y 



k+l+l=m; k,l>0 

and expanding the logarithm we get 
(x) - (piy) ' 



log 



x-y 

2^ 



(101) 



(102) 



n 



Y 4>k,+i+i, ■ ■ ■ (103) 



ki,li,---kn,l. 



35 



It follows then that 

OQ 

n{(j)) = ^ s'fcr„ J2 0h +iog'?!>i 

fc,n— 1 l\-\-l2-\ — Ik—'f^ 
°° (-1)"+! 



fcl + l+il ■ ■ ■ 4>k„ + l+l„^ki + ---k„^h + -iW^) 



While this formula may not be particularly transparent, it does accomplish 
our goal of finding a variational principle that determines the moments. The 
parameters (/>fc characterize the probability distribution of the eigenvalues: they 
determine the moments G„ by the above series. By extremizing the action O 
as a function of these (f>k, we can then determine the moments. We will be able 
to generalize this version of the action principle to multi-matrix models. In 
practice we would choose some simple function (t>{x) such as a polynomial to get 
an approximate solution to this variational problem. Since all one-matrix models 
are exactly solvable, we can use them to test the accuracy of our variational 
approximation. 

9.1 Explicit Variational Calculations 

Consider the quartic one matrix model. Its exact solution in the large N limit 
is known from the work of Brezin et. al. |22l: 



Z{g) = y^Ae^t'I-i^'-^'^' 



Eexact(g) = - hm -^log^l^ (105) 

exactyyj & ^^^^ \ j 

The gaussian with unit covariance is our reference action. Choose as a varia- 
tional ansatz the linear change of variable — (f>ix, which merely scales the 
Wigner distribution. The </ii that maximizes fl represents the Wigner distribu- 
tion that best approximates the quartic matrix model. 



fi(0i) = log(/)i-iG2-5G4 (106) 
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Here G2k = (j^i'^k- Letting a = (f)\, ^{a) = ^loga — f — "^go? is bounded 
above only for g > 0. It's maximum occurs at a{g) — ^'^^^l^'^ — ■ Notice that 
a is determined by a non-linear equation. This is reminiscent of mean field 
theory; we will sometimes refer to the gaussian ansatz as mean field theory. 
Our variational estimates are: 



E{9) ^ 
G2fc(g) ^ 

The exact results from [|2| are 

GT^g) = ^^^^a'Hgm + 2-ka'ig)]. (108) 

where a^(g) — 2|^[— 1 + ^1 + 48.g]. In both cases, the vacuum energy is analytic 
at g = with a square root branch point at a negative critical coupling. The 

mean field critical coupling g^^^ ~ ~M ^'^'^ more than the exact value 

„ex ^ _j_ 
He 48- 

The distribution of eigenvalues of the best gaussian approximation is given 
by Pg{x) — (pi^ po{(f>i^x) where po{x) — -^V 4 — x^, \x\ < 2 is the standard 
Wigner distribution. The exact distribution 

PeAx,g) = -d+'iga^ig) + 2gx^)^'ia^ig)^x^, \x\<2aig). (109) 

is compared with the best gaussian approximation in figure 0. The latter does 
not capture the bimodal property of the former. 

The vacuum energy estimate starts out for small g, being twice as big as 
the exact value. But the estimate improves and becomes exact as g — > oo. 



log 

2 ^ 16ff 

(-^ + ^^^)^C,. 



mg 



(107) 
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Meanwhile, the estimate for G2{g) is within 10% of its exact value for all g. 
G2k, k > 2 for the gaussian ansatz do not have any new information. However, 
the higher cumulants vanish for this ansatz. 

We see that a gaussian ansatz is a reasonable first approximation, and is 
not restricted to small values of the coupling g. To improve on this, get a non- 
trivial estimate for the higher cumulants and capture the bimodal distribution 
of eigenvalues, we need to make a non-linear change of variable. 

9.2 Non-linear Variational Change of Variables 

The simplest non-linear ansatz for the quartic model is a cubic polynomial: 
4>{x) = (pix + (j>3X^. A quadratic ansatz will not lower the energy since S{A) is 
even. Our reference distribution is still the standard Wigner distribution. 0i^3 
are determined by the condition that 



x~y 2 



be a maximum. Considering the success of the linear change of variable, we 
expect the deviations of (/)i.3 from their mean field values (^/a, 0) to be small. 



irrespective of g. Within this approximation, we get (with a = 

Va(-3 + 2a + (1 - 32g)a^ + A8ga^ + lUg'^a^) 



/a - 



3 + 4a + (1 + 96.g)a2 + A8ga^ + A32g^a'i 
i;5(-2 -I- a) 



(111) 



^ 3 + 4a + (1 + 96.g)a2 + 48.ga3 -|- 432^204 ' 

from which we calculate the variational greens functions and vacuum energy. 
The procedure we have used to obtain 3 can be thought of as a 1 loop calcu- 
lation around mean field theory. Comparing with the exact results of p2[ , we 
find the following qualitative improvements over the mean field ansatz. 

In addition to the mean field branch cut from —00 to 5^^^, the vacuum 
energy now has a double pole at g*^^ < 3^^'°°^ = "'^"^15138^ ^ dc'^ ■ 
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Eigenvalue Distribution 
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Figure 1: Eigenvalue Distribution. Dark curve is exact, semicircle is mean field 
and bi-modal light curve is cubic ansatz at 1-loop. 

understand this double pole as a sort of Fade approximation to a branch cut 
that runs all the way up to g^^. The vacuum energy variational estimate is 
lowered for all g. 

Figure |^ demonstrates that the cubic ansatz is able to capture the bimodal 
nature of the exact eigenvalue distribution. If x{^) — 4'^^{^)j then p{x) = 
Poixix))x'ix): where po{x) = ^V4- < 2. 

The greens functions G2, G4 are now within a percent of their exact values, 
for all g. More significantly, the connected 4-point function Gl = G4 — 2(6*2)^ 
which vanished for the gaussian ansatz, is non-trivial, and within 10% of its 
exact value, across all values of g. 

9.3 Formal Power Series in One Variable 

Given a sequence of complex numbers (oq, ai, 02, • ■ ■), with only a finite number 
of non-zero entries, we have a polynomial with these numbers as coefficients pq|: 



Note that all the information in a polynomial is in its coefficients: the variable 
z is just a book-keeping device. In fact we could have defined a polynomial 
as a sequence of complex numbers (ao,ai,---) with a finite number of non- 
zero elements. The addition multiplication and division of polynomials can be 



00 




(112) 



n=0 
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expressed directly in terms of these coefficients: 

[a + b]n = an + bn, [ab]n= ^ akbi, [Da]„ = (n + l)a„+i. (113) 

k + l = n 

A formal power series {ao,ai,- ■■) is a sequence of complex numbers, with 
possibly an infinite number of non-zero terms. We define the sum, product and 
derivative as for polynomials above: 

[a + b]n = an + bn, [ab]n= ^ akbi, [Da]n = (n + l)an+i- (114) 

k + I = n 

The set of formal power series is a ring, indeed even an integral domain. ( The 
proof is the same as above for polynomials.) The opration Z) is a derivation on 
this ring. The ring of formal power series is often denoted by C[[2]]. The 
idea is that such a sequence can be thought of as the coefficients of a series 
J2^=o ^nz"", the sum and product postulated are what you would get from this 
interpretation. However, the series may not make converge if z is thought of as 
a complex number: hence the name formal power series. 
The composition aob is well-defined whenever bo = 0: 

oo 

[aob]n = ^ak KK-'-hk- (115) 

/c— /iH Ik—^ 

The point is that, for each n there are only a finite number of such Vs so that 
the scries on the rhs is really a finite series. In terms of series, this means we 
substitute one series into the other: 

aob{z) = a{b{z)). (116) 

9.4 The Group of automorphisms 

The set of formal power series 

oo 

e = WW = E'^"^"l'^o = 0!'^i^O} (117) 
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is a group under composition: the group of automorphisms. The group law is 

n 

[0O0]n=^<^fe J2 ^h---4>h (118) 
fe=l h+l-2---+lk='n 

The inverse of (j) (say (^) is determined by the recursion relations of Lagrange: 




(119) 



^ is a topological group with respect to the ultrametric topology. It can 
be thought of as a Lie group, the coefficients being the co-ordinates. The 
group multiplication law can now be studied in the case where the left or the 
right element is infinitesimal, leading to two sets of vector fields on the group 
manifold. For example, if ^{x) = x + ea;'^+^, the change it would induce on the 
co-ordinates of (f) is 

[Cu<t>]n= ^h---^h+. (120) 

or equivalently, 

Ck(l){x) = (l){xf+'^ , for fc = 0, 1, 2 • • • . (121) 
By choosing ^(x) = x + ex*^+^ in (f)o (p we get the infinitesimal right action: 

TZk(t){x) = x''+^D^{x), for A; = 0, 1, 2 • • • . (122) 
Both sets satisfy the commutation relations of the Lie algebra £: 

[Cm,Cn\ = (n - m)Cm+n, [R-m^Tln] = {u - m)Tlm+n- (123) 

This Lie algebra is also called the Virasoro algebra by some physicists and the 
Witt algebra by some mathematicians. 

There is a representation of this Lie algebra on the space of formal power 
series: 

i„a = x'^+^Da (124) 



41 



9.5 Cohomology of Q 



Now let V be the space of formal power series with real coefEcients. Then Q, 
the group of automorphims has a representation on V: 



g = {0:Z+^i?|0o = O,(/.i >0}, 

V ^ {a:Z+^R}, 4>* aix) ^ a{(j)-\x)). 



(125) 



Now, log[(/>(a;)/x] is a power series in x because (j){x)/x is a formal power 
series with positive constant term: [(p{x)/x]() — (pi > 0. We see easily that 
c{4i,x) — log[(f){x) / x)] is a 1-cocycle of Q twisted by the representation V: 

' (f>i{(p2{x))' 



c(0i(/>2,a;) 



log 



log 



X 

(f>l(<p2{x)) 



02 (a;) 



log 



c(0i(/)2, 1^2(3:)) + c{(j)2:X). 



(126) 



Of course, neither \og(j){x) nor logo: are power series in x. So this cocycle is 
non-trivial. 

The space of formal power series in two commuting variables ( Sym^ V) also 
carries a representation of Q. We again have a non-trivial 1-cocycle^ on this 
representation: 

(x) - (j){yy 



b,x,y) = log 



x-y 



(128) 



We recognize this as the entropy of the single matrix model. The same argu- 
ment shows that this is a non-trivial cocycle of Q. 

Now we understand that the entropy of the single matrix models has the 
mathematical meaning of a non-trivial 1-cocycle of the group of automorphisms. 
It explains why we could not express the entropy as a function of the moments. 
This points also to a solution to the difhculty: we must think in terms of the 



l^The formula 



x-y 



(127) 



can be used to show that c{<l>,x,y) is a formal power series in x and y. 
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automorphism (j) rather than the moments as parametrizing the probabihty dis- 
tribution. 



10 Appendix: Formula for Cocycle 

We will now get an explicit formula for a{(l>, A). The Jacobian matrix of <j) is 
obtained by differentiating the series (p{A)i = Ai + X^^2 '"^n ' ' ' ^»n- 

jajd... _ d4>t,iA) 

-Jib c - Qj^C^ 

m-\-n>l 

:= 5i5f5t + Kt,i\A). (129) 

If we suppress the color indices a, b, c, d, 

ji{A) = 5f 1 1 + ^l^^Ai ®Aj:= 511^1 + Ki{A) (130) 

We can now compute 

1 1 • • 

+v Jf'"^' A\ Jf'"! '262 t^a2 isbs it^o-n iibi 

^2 ^ - ]S[2 ^hbi as ^1262 a2 ■ ■ ■ -^inbn ai 

_ lKii2Li 2K2i3L2 IKnilLn 



^[AKAZ--MKSa:^[ALAt---[AL^t 

^K,i2L,^K2isL2 . . . ^lnnLn^j,^...j,^^^^...^^ 



Thus, 



a{4>,A) = -i^logdet[l + i^(A)] 



00 



i-iy 



n=l 



ra=l 

This is the formula we presented earlier. 
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